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Abstract 

The  paper  focuses  on  Benders  decomposition  techniques  and  Monte  Carlo  sam¬ 
pling  (importance  sampling)  for  solving  two-stage  stochastic  linear  programs  with 
recourse,  a  method  first  introduced  by  George  B.  Dantzig  and  Peter  Glynn  (1990). 
The  algorithm  is  discussed  and  further  developed.  The  paper  gives  a  complete  pre¬ 
sentation  of  the  method  as  it  is  currently  implemented.  Numerical  results  from  test 
problems  of  different  areas  are  presented.  Using  small  test  problems  we  compare 
the  solutions  obtained  by  the  algorithm  with  the  universe  solutions.  We  present 
the  solution  of  large-scale  problems  with  numerous  stochastic  parameters  which  in 
the  deterministic  equivalent  formulation  would  have  billions  of  constraints.  The 
problems  concern  expansion  planning  of  electric  utilities  with  uncertainty  in  the 
availabilities  of  generators  and  transmission  lines  and  portfolio  management  with 
uncertainty  in  the  future  returns. 


*  This  paper  is  an  update  of  Technical  Report  SOL  89-13R,  August  1990,  De¬ 
partment  of  Operations  Research,  Stanford  University,  with  extended  numerical 
results  including  numerical  results  of  large-scale  test  problems. 
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1.  Introduction 

A  stochastic  linear  program  is  a  linear  program  whose  parameters  (coefficients, 
right  hand  sides)  are  uncertain.  The  uncertain  parameters  are  assumed  to  be  known 
only  by  their  distributions.  That  means  that  the  values  of  some  functions  are 
numerical  characteristics  of  random  phenomena,  e.g.  mathematical  expectations  of 
functions  dependent  on  decision  variables  and  random  parameters. 

Suppose  a  function  z  —  E  C(V )  is  an  expectation  of  a  function  C( v"),w  G  Cl. 
V  is  a  random  parameter  which  has  outcomes  t/w.  Cl  is  the  set  of  all  possible  random 
events.  It  can  be  finite,  infinite,  discrete  or  continuous.  In  the  continuous  case  the 
computation  of  the  expected  value  requires  to  solve  the  integral: 

E  C(V)  =  j  C{v“)P(du) 
with  P  being  the  probability  measure. 

In  a  general  case  V  would  consist  of  several  components,  e.g.  V  =  (Vi, . . . ,  Vh) 
with  outcomes  vw  which  we  also  will  denote  only  by  lower  case  letters,  e.g.  v  = 
(i>i, . . . ,  Vh)  and  p(v“)  alias  p(v )  would  denote  the  corresponding  density  function. 
We  assume  the  components  of  V  to  be  independent.  In  addition  we  will  construct 
Cl  by  crossing  the  sets  of  outcomes  for  vector  entry  iq,  i  =  1, . . . ,  h  as 


x  fi2  x  •  •  ‘  x  fth- 


In  this  case  the  above  mentioned  integral  takes  the  form  of  a  multiple  integral: 


E 


C(V)  =  hi  C(v)p(v)dvi  . . .  dvh 


In  the  case  of  Cl  being  discrete  and  finite  the  expectation  can  be  computed  by 
a  multiple  sum: 

EC(V)  =  £---£c(v)p(v). 


Vt  Vh 
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The  main  difficulties  in  stochastic  linear  programming  deal  with  the  evalu¬ 
ation  of  the  multiple  integral  or  the  multiple  sum.  The  numerical  computation 
of  the  expectation  requires  a  large  number  of  function  evaluations  and  each  func¬ 
tion  evaluation  means  a  linear  program  to  be  solved.  Different  approaches  attack 
this  problem,  e.g.  Birge  (1985),  Birge  and  Wets  (1986),  Birge  and  Wallace  (1988), 
Frauendorfer  (1988),  Frauendorfer  and  Kail  (1988),  Ermoliev  (1983),  Kigle  and  Sen 
(1989),  Kali  (1979),  Pereira  et  al.  (1989),  Rockafellar  and  Wets  (1989),  Ruszczynski 
(1986),  Wets  (1984),  and  others.  See  Ermoliev  and  Wets  (1988)  for  references.  We 
follow  the  concept  of  Dantzig  et  al.  (1989)  and  Dantzig  and  Glynn  (1990). 


2.  Two  Stage  Stochastic  Linear  Program 

An  important  class  of  models  concerns  dynamic  linear  programs.  Variables 
which  describe  activities  initiated  at  time  t  have  coefficients  at  time  t  and  t  +  1. 
Deterministic  dynamic  linear  programs  appear  as  staircase  problems.  The  simplest 
staircase  problem  is  that  with  two  stages:  X  denotes  the  first,  Y  the  second  stage 
decision  variables,  A,  b  represent  the  coefficients  and  right  hand  sides  of  the  first 
stage  constraints  and  D,d  concern  the  second  period  constraints  together  with  B 
which  couples  the  two  periods,  c,  /  are  the  objective  function  coefficients. 

In  the  deterministic  case  c,  /,  .4,  b,  B ,  D,  d  are  known  with  certainty  to  the  plan¬ 
ner.  In  the  stochastic  case,  the  parameters  of  the  second  stage  arc  not  known  to 
the  planner  at  time  t  =  1,  but  will  be  known  at  time  t  =  2.  At  time  t  =  1  only 
the  distributions  of  these  parameters  are  assumed  to  be  known.  The  second  stage 
parameters  can  be  seen  as  random  variables  which  get  certain  outcomes  with  cer¬ 
tain  probabilities.  We  denote  a  certain  outcome  of  these  parameters  with  u>  and  the 
corresponding  probability  with  pw,u>  €  D,  the  set  of  possible  outcomes. 
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(1) 


min  Z  =  cX  +  E^ifY") 
s/t  AX  =  b 

-  BUX  +  DY w  =  dw 
X,Y“  >0, 

In  (1)  a  two  stage  staircase  problem  is  transformed  into  a  two  stage  stochastic 
linear  program  and  the  parameters  d  and  B  being  random  variables.  Given  the  two 
stage  stochastic  linear  program  one  wants  to  make  a  decision  X  which  is  feasible 
for  all  scenarios  and  has  the  minimum  expected  costs. 

We  consider  the  case  of  Cl  being  discrete  and  finite,  e.g.  Cl  =  (1,...,A'),  the 
parameter  u>  takes  on  K  values.  Then  we  can  formulate  an  equivalent  deterministic 
problem  to  the  stochastic  linear  problem.  This  is  tractable  if  K  is  small.  For  K 
scenarios  the  deterministic  equivalent  problem  is  given  in  (2). 

min2’=  cX  +  plfY1  + 
s/t  AX 

-BlX+  DY 1 
- B2X 

-bkx 

x,  K\K2 . 

Two  stage  stochastic  linear  programs  were  first  studied  in  Dantzig  (1955)  and 
then  developed  by  many  authors.  The  method  which  we  want  to  apply  here  is  using 
Benders  (1962)  decomposition.  Van  Slyke  and  Wets  (1969)  suggested  to  express  the 
impact  of  the  second  period  by  a  scalar  6  and  “cuts”,  vhich  are  necessary  conditions 
to  the  problem  and  are  expressed  only  in  terms  of  the  first  period  variables  X  and 
8.  Benders  decomposition  splits  the  original  problem  into  a  master  problem  and 
a  subproblem  which  decomposes  into  a  series  of  independent  subproblems,  one  for 
to  each  u>  €  Cl.  According  to  the  L-shaped  method  the  master  problem,  the  sub 
problems  and  the  cuts  are  represented  in  (3),  (4)  and  (5). 


p2fY 2  +  •  •  •  +  pKfYK 


+  DY2 


=  b 
=  d1 
=  d2 


(2) 


- K 


+  DYk  =  dK 
>  0 
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The  master  problem: 


min  zm  —  cX  +  0 

s/t  AX  =  b 

-  G'X  +  cnl9  >  ge,  £  =  \ 
X,6>  0 


The  sub  problems: 

min  =  p“fY“ 

s/t  p"*"  :  DYU  =  d“  +  BUX  (4) 

r>  0,wGfi,  e.j.fi  =  {l,2,...,A'} 

where  p^ir^*  is  the  optimal  dual  solution  of  subproblem  u>. 

The  cuts: 


g  =  w*dw  = 

G  =  Swpw7rw*5w  =  E(tvu*B“)  (5) 

a*  =  0  . . .  feasibility  cut 
a1  =  1 . . .  optimality  cut 

By  solving  the  master  problem  we  obtain  a  solution  X.  Given  -Y  we  can  solve 
K  subproblems  w  G  to  compute  a  cut.  The  cut  is  a  lower  bound  on  the  expected 
value  of  the  second  stage  costs  represented  as  a  function  of  A\  Cuts  are  sequentially 
added  to  the  master  problem  and  new  values  of  X  are  obtained  until  the  optimality 
criterion  is  met.  We  distinguish  between  two  types  of  cuts,  feasibility  cuts  and 
optimality  cuts.  The  first  refers  to  infeasible  subproblems  for  a  given  A’  and  the 
latter  to  feasible  and  optimum  subproblems,  given  X. 

If  the  expected  values  z,G ,  and  g  are  computed  exactly,  that  is,  by  evaluating 
all  scenarios  u  €  Q,  we  refer  to  it  as  the  universe  case.  As  we  will  see  later  the 
number  of  scenarios  easily  gets  out  of  hand  and  it  is  not  always  possible  to  solve  the 
universe  case.  Therefore  methods  are  sought  which  guarantee  a  satisfying  solution 
without  solving  the  universe  case. 
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3.  Monte  Carlo  Sampling 

Each  iteration  of  Benders  decomposition  requires  the  computation  of  expected 
values,  e.g.  the  subproblem  costs,  the  coefficients  and  right  hand  sides  of  the  cuts. 
For  each  outcome  u>  €  fl  a  linear  program  has  to  be  solved.  The  expected  value  of 
the  subproblem  costs  is  denoted  by 

z  =  e  c(vu)  =  e  /y*w,  e  n, 

with  Y *u>  being  the  optimum  solution  of  subproblem  uf.  The  number  of  elements 
of  Q  is  determined  by  the  dimensionality  of  the  stochastic  vector  V  =  (Vf, ...,  Vj,). 
Typically  the  dimension  h  of  V  is  quite  large.  For  example,  in  expansion  planning 
problems  of  electric  power  systems  one  component  of  V  denotes  the  availability  of 
one  type  of  generators  or  one  demand  of  power  in  a  node  of  a  multi- area  supply 
network  or  the  availability  of  one  type  of  transmission  line  connecting  two  nodes. 
Consider  several  nodes  and  arcs  and  one  demand  and  some  options  of  generators 
at  each  node.  The  number  of  scenarios  K  in  the  universe  case  quickly  gets  out  of 
hand,  even  if  the  distribution  of  each  component  of  V  is  determined  by  just  a  small 
number  K'  of  discrete  points.  Suppose  e.g.  h  ~  20  and  Kl  =  5,  i  =  1, . . . ,  20.  Then 
the  total  number  of  t*»rms  in  the  expected  value  calculations  is  K  =  520  «  1014, 
which  is  not  practically  solvable  by  direct  summation.  Monte  Carlo  methods  appear 
promizing  to  compute  multiple  integrals  or  multiple  sums  for  h  large  (Davis  and 
Rabinowitz  (1984)).  See  Hammersly  and  Handscomb  (1964)  for  a  description  of 
Monte  Carlo  sampling  techniques. 

3.1  Crude  Monte  Carlo 

Suppose  v“,u>  =  1, . . .  ,n  are  scenarios,  sampled  independently  from  their  joint 
probability  mass  function,  then  Cw  =  C(vu)  are  independent  random  variates  with 
expectation  z. 

n 

z  =  (1/n)  Y  C“  (6) 

w=l 
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is  an  unbiased  estimator  of  z  and  its  variance  is 


a\  =  a2  fn 
<t2  —  var(C(V)). 

Thus  the  standard  error  is  decreasing  with  sample  size  n  by  n~0  5.  The  con¬ 
vergence  rate  of  z  to  z  is  independent  of  the  dimension  h  of  the  random  vector 

V. 

3.2  Importance  Sampling 

We  rewrite 


z 


=  e  =  E 

Wfctt 


C(yu')p(vu,)q(vuj) 

q(vw) 


by  introducing  a  probability  mass  function  g(i>“').  We  can  see  q  as  a  probability 
mass  function  of  a  random  vector  W,  therefore  by  change  of  variables 


z  =  E 


C{W)p(W) 

q(W) 


We  obtain  a  new  estimator  of  z, 

_  _  1  ^  C(ww)p(wu) 

which  has  a  variance  of 


var(z) 


wen  x 


C{ww)p{wul) 


q{w“). 


Choosing  q*(wul)  —  C(w  )p(w  ) —  WOuld  lead  to  var(z)  =  0,  that  means 

2_L€  nC(-u,W)p('wW) 

we  could  get  a  perfect  estimate  of  the  multiple  sum  just  by  one  single  observation. 
However  this  is  practically  useless,  since  to  sample  C.p/q  we  have  to  know  q  and  to 
determine  q  we  need  to  know  z  =  X^wgn  C(wW)p(wW)->  which  we  eventually  want  to 
compute.  Nevertheless  this  result  helps  to  derive  some  heuristics  of  how  to  choose 
q:  It  should  be  approximately  proportional  to  the  product  C(wUJ)p(u’UJ)  and  have  a 
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form  which  cam  be  integrated  analytically.  For  instance  using  the  additive  (separable 
in  the  components  of  the  stochastic  vector)  approximation: 

h 

C(V)«£Ci(Vj) 

t=l 

could  be  a  possible  way  to  compute  a  proper  q : 

C(ww)p(wui 

Si=l  SwGfJ. 

In  this  case  one  has  to  solve  only  h  1-dimensional  sums  instead  of  1  fo-dimen- 
sional  sum.  Depending  on  how  well  the  additive  model  approximates  the  original 
cost  surface  the  above  mentioned  estimator  will  lead  to  smaller  variances  compared 
to  crude  Monte  Carlo  sampling.  Of  course  if  the  original  cost  surface  has  the 
property  of  additivity  (separability)  no  sampling  is  required,  as  the  multiple  sum  is 
computed  exactly  by  h  1-dimensional  sums. 

The  advantage  of  this  approach  lies  in  the  fact  that  even  if  the  additive  model 
is  a  bad  approximation  to  the  cost  surface  the  method  works.  The  price  that  has 
to  be  paid  is  a  high  sample  size.  The  variance  reduction  compared  to  crude  Monte 
Carlo  will  be  small.  For  the  theory  of  importance  sampling  we  refer  to  Glynn  and 
Iglehart  (1989).  See  also  Dantzig  and  Glynn  (1990). 

R.  Entriken  and  M.Nakayama  in  Dantzig  et.  al.  (1989)  developed  an  impor¬ 
tance  sampling  scheme  using  an  additive  model  to  approximate  the  cost  function 
E  C(V).  Actually  C(v)  is  approximated  by  a  marginal  cost  model,  considering 
marginal  costs  in  each  dimension  t  of  V  and  a  base  case,  the  point  from  which  the 
approximation  is  developed.  We  will  use  this  approach  here.  As  we  employ  impor¬ 
tance  sampling  within  the  Benders  decomposition  algorithm  the  costs  C(v,X),  the 
approximation  of  the  costs  T(u,  X)  and  thus  the  importance  distribution  of  q(v,X) 
depend  also  on  X ,  the  current  solution  of  the  master  problem.  Introducing  the 
costs  of  the  base  case  C(r,  X)  makes  the  model  more  sensitive  to  the  impact  of  the 
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stochastic  variables  V. 


c(v,x)*r(v,x)  =  c(T,x)  +  YMi(vi,x), 

fcr  (7) 

Mi(Vi,X)  =  C(TU...,Ti_uVi,Ti+l,...,Th,X)-C(T,  X). 
t  =  (rj, . . . ,  r^)  can  be  any  arbitrary  chosen  point  out  of  the  set  of  values  vt,  i  = 
1, . . . ,  h.  For  example  we  choose  r,  as  that  outcome  of  Vi  which  leads  to  the  lowest 
costs,  ceteris  paribus.  These  values  can  be  found  easily.  Note  that  the  second  stage 
costs  are  computed  by  a  linear  program,  where  the  uncertain  parameters  appear  in 
right  hand  side.  Therefore  the  second  stage  costs  are  convex  in  the  stochastic  pa¬ 
rameters  V.  The  sign  of  the  dual  variables  associated  with  the  stochastic  parameter 
indicate  the  direction  to  lowest  costs.  In  the  context  of  expansion  planning  of  power 
systems  this  means  selecting  respectively  lowest  demands  and  highest  availabilities 
of  generators  and  transmission  lines. 


Defining 


Mi(X)  =  E  Mi(Vi,X)  =  £ 


ii*> 

Xl.A/iW.JO 


where  we  assume  that 


y]  Mi(vf,X)  >  0.  that  means  at  least  one  M,(v~ ,  X)  >  0, 

i=i 

we  can  express  the  expected  value  of  the  costs  in  the  following  form: 

*(■*)  =  C( T,X)  +  J2  M.(X)  Y,  F(v“,X)^gfP  nP;«).  (10) 

1=1  wen  Mi\X)  J=l 

Note  that  this  formulation  consists  of  a  constant  term  and  a  sum  of  h  expecta¬ 
tions.  Given  a  fixed  sample  size  n  we  partition  n  into  h  sub-samples,  with  sample 
sizes  n,,  i  =  1, . . . ,  h  such  that  E  n,  =  n  and  n,  >  1,  i  —  1, . . . ,  n  and  n,  being  ap¬ 
proximately  proportional  to  A/,.  The  h  expectations  are  separately  approximated 
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by  sampling  using  marginal  densities.  The  i-th  expectation  corresponds  of  course 
to  the  i-th  component  of  V.  Generating  sample  points  in  the  i-th  expectation  we 
use  the  importance  density  (p,Af,/M,)  for  sampling  the  i-th  components  of  V  and 
the  original  marginal  densities  for  any  other  components.  Denoting 

=  -f'v.i)  (ii) 

the  estimate  of  the  i-th  sum,  we  obtain 

h 

z{X)  =  C(t,X)  +  £m,(A>,(  X),  (12) 

1=1 


the  estimated  expected  value  of  the  second  stage  costs  z(A”). 

Let  a?(X)  be  the  estimated  sample  variance  of  the  i-th  expectation,  where 
cr](X)  =  0  if  rii  =  1.  The  estimated  variance  of  the  mean,  o\{X),  is  then  given  by 


j2(  y)  j -M,;(XK2(X) 
.  ni 


Using  importance  sampling  one  can  achieve  significant  variance  reduction.  The 
experiment  of  M.  Nakayama  in  Dantzig  et  al.  (1989)  claims  a  variance  reduction 
of  1:20000  using  importance  sampling  versus  crude  Monte  Carlo  sampling:  For  a 
given  and  optimal  X  the  second  stage  costs  of  a  multi  area  expansion  planning 
model  with  192  universe  scenarios  were  sampled  with  a  sample  size  of  10  using  both 
methods  and  the  results  compared. 

The  derivation  above  concerned  the  estimation  of  the  expected  second  stage 
costs  z(X).  To  derive  a  cut  we  use  the  same  framework  analogously.  Note  that 
a  cut  is  defined  as  an  outer  linearization  of  the  second  stage  costs  represented  as 
a  function  of  X,  the  first  stage  variables.  At  X,  the  value  of  the  cut  is  exactly 
the  expected  second  stage  costs  z(X).  Note  also  that  any  choice  of  q  is  a  valid 
choice.  As  we  do  not  want  to  derive  different  importance  distributions  for  the 
coefficients  and  the  right  hand  side  of  a  cut,  we  use  the  q  already  at  hand  from  the 
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cost  estimation.  Therefore  we  employ  directly  the  cost  approximation  scheme  and 
the  importance  distribution  to  compute  the  gradient  and  the  right  hand  side  of  a 
cut.  With  B(vu)  :=  and  d(vu)  :=  dw  being  the  outcome  of  B  and  d  in  scenarios 
u>,u>  €  ft  and  7r*(uu',  X)  :=  iru*(X),  the  optimum  dual  solution  in  scenario  u,  we 
define 


Fa(v“,X) 


*‘(v“,X)B(iiw)  -  ir*(r,  X)B(t) 


(H) 


and  compute: 


7r*(u",  X)d(vu)  —  7r*(r,  X)d(r) 


(15) 


G(X)  =  *’(r,X)B(r)  +  £  M,(X)  £  r°(V’,X)M£’‘'?X)  jj  P>«)  (16) 

Mi\X)  j=1 


j=1 


u>en 


S<V)  =  »*(T.A)B(r)  +  £*<(*)  ^  F'(«“,je)^|l)  nPi(»p,  (17) 

.=i  u-en  Mt(X)  J=1 

the  coefi.  .ents  and  the  right  hand  side  of  a  cut.  We  estimate  the  expected  values 
again  by  sampling  using  the  same  sample  points  as  at  hand  from  the  computation 
of  z. 


Using  Monte  Carlo  sampling  we  obtain  z(X),G(X),g(X),  which  are  approx¬ 
imations  of  the  expected  values  z(X),  G(X),  g(X)  We  also  obtain  the  estimated 
variance  of  the  mean  of  the  second  stage  costs  <r*(X).  The  impact  of  using  approx¬ 
imations  instead  of  the  exact  parameters  on  the  Benders  decomposition  algorithm 
is  analyzed  in  the  following  section. 


4.  Benders  Decomposition 

In  the  following  we  will  derive  the  main  steps  of  Benders  decomposition  al¬ 
gorithm  for  two  stage  stochastic  linear  programs  considering  the  “universe”  case, 
which  gives  the  exact  solution  of  the  equivalent  deterministic  problem  ( “certainty 
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equivalent”).  We  will  then  analyze  the  impact  of  sampling  of  subproblems  on  Ben¬ 
ders  decomposition.  See  GeofFrion  (1970)  for  a  derivation  of  Benders  decomposition 
algorithm. 

Given  the  equivalent  deterministic  problem  in  (2)  and  assuming  K  scenarios 
describe  the  universe  case,  we  rewrite  the  problem  applying  projection  onto  the  X 
variables  and  obtain  (18).  We  assume  for  simplicity  that  (2)  is  feasible  and  has  a 
finite  optimum  solution. 


minZ  = 

cX  +Min  \p1fYl+p2fY2+---+pKfYK} 

AX  =  b  DY1  =d1+B'X 

X>0  DY 2  =cP+B2X 


DYk=  dK +Bk  X 
YK>  0 


(18) 


r‘,r2 

The  infimal  value  function  in  (18)  corresponds  to  the  following  primal  linear 
problem  (19): 


in  zP  =  PlfY 1  +  p2fY2  +■■■+  pKfYK  =  E“(fYu) 
DY 1  =  dl  +  BlX 

DY 2  =  d2  +  B2X 


mm 

p 17r1  : 
p2  tt2  : 


(19) 


pKnK: 

Y\Y2,  .. 

and  to  the  dual  linear  problem  (20): 
max  zd  — 

p1n1(d1  +  BlX)+p2Tr2(<{2  +B2X)+  ■ 
nl  D 

7 x2D 


DYk  =  dK  +BkX 
,  YK  >  0 


+pKTrK(dK  +  BkX) 


<  f 

<  f 


(20) 


7 TKD  <  f 

The  primal  problem  is  parameterized  in  the  right  hand  side  by  X.  The  as¬ 
sumption  (2)  being  finite  implies  that  (19)  is  finite  for  at  least  one  value  of  X  for 
which  X  >  0  and  AX  —  b.  Applying  the  Duality  Theorem  of  Linear  Programming 
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we  state  that  (20)  has  to  be  feasible.  The  feasibility  conditions 


x“D  -  /  <  0 


indicate  that  the  feasible  region  {xw\xu  D  —  /  <  0}  is  independent  of  X  and  u  and 
just  repeated  for  each  scenario  u  £  ft. 

The  assumption  (2)  being  feasible  requires  feasibility  of  the  primal  problem  (19) 
for  at  least  one  X  for  which  X  >  0  and  AX  =  b.  We  define  x  :=  (x1  ,x2 , . . .  ,xK)  to 
be  the  vector  of  dual  variables  of  problem  (20).  By  the  Duality  Theorem  again  (20) 
has  to  be  finite.  Let  xJ,j  =  1, . . .  ,p  be  the  extreme  points  and  x J ,j  =  p+1,  • .  •  ,p+g 
be  representatives  of  the  extreme  rays  of  the  feasible  region  of  (20),  where  7T-7  := 
(7rlj,  7r2jl, . . . ,  xK>).  Problem  (20)  is  finite  if  and  only  if 


x“i(<r  +  B“X)<0,  j  =p+l,...,p  +  q 


(21) 


u>  £  ft. 

Constraints  (21)  may  be  appended  to  problem  (18)  to  ensure  that  the  problem 
is  bounded. 

Next  we  outer  linearize  the  infimal  value  function  in  (18),  whose  value  is  exactly: 


Maximum  ^  +  Bu>Xy  (22) 

By  expressing  the  infimal  value  fimction  by  the  outer  linearized  dual  problem 

and  using  9  as  the  smallest  upper  bound  the  problem  can  be  represented  in  the 

following  form: 

min  Z  —  cX  4-  9 

AX  =b 

X  >0  (23) 

9  >  Ewen  +  B"X),  j  =  1, . . .  ,p 

x j(du,  +  BwX)<  0,  j  =  p  +  1, . . .  ,p  +  q,  uj  G  ft. 

Relaxation  is  applied  to  solve  problem  (23)  as  we  do  not  want  to  know  all 

7 r*,j  =  l,...,p  +  q  in  advance:  Given  a  solution  ( X,9 )  from  the  master  problem 
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one  solves  problem  (19)  or  problem  (20),  actually  by  solving  the  individual  problems 
(4)  or  the  dual  problems  (24)  of  these: 

z“*(X)  =  max  z“  =^(dw  +  B“X) 

nuD  <  f  (24) 

7TW  >0,  well. 

We  call  7ru'*(X)  the  optimum  dual  solution  vector.  If  primal  infeasibility  or 
dual  unboundness  is  detected,  with  7ru°(X)  denoting  the  corresponding  extreme 
ray,  a  feasibility  cut 

n“\X)  •  (<r  +  Bu X)  <  0  (25) 

is  added  to  the  master  problem.  If  all  primal  problems  are  feasible  or  all  dual 
problems  bounded  an  optimality  cut: 

Q  >  ^  ■  (du  +  buX)  (26) 

is  added  to  the  master  problem.  We  call 

L(X)  :=  ■  (<*“  +  B"X)  (27) 

wen 

an  outer  linearization  of  the  second  stage  costs,  which  axe  defined  by 

z(X)  :=  £  *"•(*).  (28) 

The  relation: 

L(X)  <  z{X)  (29) 

formulates  the  main  property  of  the  outer  linearization.  Any  cut  independent  of  X 
from  which  it  was  originally  derived,  is  a  valid  cut  as  long  as  it  does  not  violate  the 
main  property  of  outer  linearization. 

Benders  decomposition  algorithm  provides  upper  and  lower  bounds  to  the  so¬ 
lution  in  each  iteration. 
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In  the  1-th  iteration 


LB*  :=  cX*  +  0*  (30) 

with  X e,  6l  being  the  optimum  solution  of  the  master  problem  is  defined  to  be  a 
lower  bound  and 

UBe  :=  min {UBe~\cXe  +  z(X1)},  UB°  =  oo  (31) 

with  2(X*)  the  second  stage  costs,  to  be  an  upper  bound  to  the  solution  of  the 
problem.  If 

{UBl  -  LBl)jLBt  <  TOL  (32) 

where  TOL  is  a  given  tolerance,  the  problem  is  said  to  be  solved  with  a  sufficient 
accuracy. 

4.1  Probabilistic  Cuts 

Employing  Monte  Carlo  sampling  techniques  means  not  to  solve  all  problems 
u  G  12,  but  solving  problems  w  €  5,  5  being  a  subset  of  12.  Instead  of  the  exact 
expected  values  z(X),  G(X),  g(X)  we  compute  the  estimates  z(-X),  G( X),  g(X) 
by  importance  sampling.  We  also  estimate  the  error  of  the  estimation  of  z(  X)  by 
the  variance  var(z(X))  =  <r|(X).  Thus  e.g.  in  the  case  of  the  second  stage  costs 
the  estimation  results  in  an  estimated  mean  with  some  error  distribution.  There  is 
good  reason  to  assume  the  error  being  normally  distributed  (Davis  and  Rabinowitz 
(1984)).  We  define  z(X)  to  be  random,  normally  distributed  with  mean  z{X)  and 
variance  cr?(X): 

z(X):=N(z(X),al(X)).  (33) 

A  cut  obtained  by  sampling  differs  in  general  from  a  cut  computed  by  solving  the 
universe  scenarios.  The  outer  linearizations  L{X)  =  G(X)X  +^(A'),  with  respect 
to  the  universe  case  and  L(X)  =  G(X) X  +  <7(-Y),  with  respect  to  the  estimation 
differ  in  the  gradient  and  the  right  hand  side.  At  X  =  X,  where  L( X)  =  r(A')  and 
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L(X )  =  z(X)  we  substitute  the  variable  9  for  z(X)  when  defining  a  cut.  By  this 
substitution  9  takes  on  the  distribution  of  z,  therefore  9  :=  N(9,al).  This  is  only 
true  at  X  =  X .  However,  we  assume  this  error  distribution  to  be  constant  with 
respect  to  X.  That  means  we  see  the  error  mainly  concentrated  in  the  right  hand 
side  of  the  cut  and  we  assign  the  variance  cr*(X)  also  to  the  right  hand  side  and 
define 

g(X)  :=  N(g(X),a:(X))  (34) 

to  be  the  random  right  hand  side  of  the  cut,  normally  distributed  with  mean  g(X) 
and  variance  a?(X).  We  can  expect  that  in  the  final  solution  cuts  will  be  binding 
at  an  X  very  close  to  X,  where  they  were  originally  derived.  The  assumption  of  a 
constant  error  distribution  of  9  is  therefore  intuitively  plausible.  See  also  Dantzig 
and  Glynn  (1990)  in  this  respect.  In  general  5  is  a  sufficiently  large  subset  of  Q  so 
that  the  variance  a\  is  small. 

Cuts  computed  by  sampling  do  not  necessarily  meet  the  condition  of  outer 
linearization.  Violating  this  condition  a  cut  intersects  and  separates  parts  of  the 
feasible  region  of  the  second  stage  problem.  A  sampled  cut  is  therefore  not  a  valid 
cut. 

4.2  Upper  and  Lower  Bounds 

For  random  second  stage  costs  z(Xt)  and  random  right  hand  sides  gl,  £  = 
1, . . . ,  L  the  upper  and  lower  bounds  of  the  problem  as  provided  by  Benders  decom¬ 
position  algorithm  are  probabilistic. 

The  Upper  Bounds 


UBe  :=  cX*  +  z(X‘),  £=l  (35) 

are  random  parameters,  normally  distributed  with  means  U Bl  and  variances 
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(36) 


UBe  :=  NiUB^aKX1)) 

We  define  the  lowest  upper  bound  to  be  the  upper  bound  with  the  lowest  mean 

UB^  :  UBiin  :=  min  {UB '}  (37) 

Z  —  X 

with  corresponding  variance  cr^BX,  • 

min 

The  lower  bounds  are  obtained  from  the  solution  of  the  master  problem.  To 

determine  the  distribution  of  a  lower  bound  consider  the  master  problem  at  iteration 

L:  ~  r  r 

LBl  =  z*M  =  min  zj^  =  c  X  +  9 

s/t  p\:  AX  =  b 

p1  :  -G1  X  +  9  >  g1 

pL  :  -GlX  +9>  gL 
X ,  0>  0 

where  L  optimality  cuts  have  been  added  to  the  originally  relaxed  master  prob¬ 
lem.  We  do  not  consider  feasibility  cuts  for  the  following  argument,  as  they  are 
exact.  The  vector  p°  and  the  scalars  p*,£  =  1, . . . ,  L  denote  the  dual  prices.  The 
right  hand  sides  gl,Z  =  1  are  independent  stochastic  parameters,  normally 

distributed.  We  assume  independence  as  the  cuts  are  generated  from  independent 
samples,  neglecting  the  dependency  that  X1,  Z  =  1, . . . ,  L  are  weakly  connected  by 
Benders  decomposition  algorithm. 

With  the  random  parameters  ge,Z  =  1  in  the  right  hand  side  also  the 

optimum  solution  z*M  will  be  random.  We  define  the  optimum  solution  of  the  master 
problem 

:=  ^(^Mivar(^ML))  (39) 

to  be  a  random  parameter,  normally  distributed,  with  mean  zjyf  and  variance 
var(zj^).  Hence  one  could  experimentally  obtain  the  distribution  of  by  ran¬ 
domly  varying  the  right  hand  sides  according  to  N  samples  j  =  1 ,..., TV  drawn 


from  the  normal  distributions  of  ge,  l  =  1, . . . ,  L  and  by  solving  the  master  problem 
for  all  N  samples.  One  could  estimate  the  mean  and  the  variance  of  the  distribuiton 
from  the  samples  j  =  1, N .  As  this  is  a  very  expensive  way  to  obtain  an  es¬ 
timate  of  the  lower  bound  distribution,  we  proceed  instead  in  the  following  way. 
We  have  already  stated  that  we  choose  a  sample  size  jS(,  such  that  the  variances 
<r|,f  —  1,. . .  ,L  are  small.  If  the  variances  are  small  we  can  assume  that  for  all 
outcomes  of  the  random  right  hand  sides  g*,£  =  1, . . . ,  L,  the  optimum  solution  of 
the  master  problem  has  the  same  basis.  Then  we  can  compute  the  mean  of  the 
lower  bound  estimate: 


Zfoi  =  min  zm  —  c  X  +  9 

s/t  p°  :  AX  =6 

p1  :  ~GlX  +0>  gl 

:  :  <4°) 

pL  :  -GLX  +  0  >  gL 

X ,  0>  0 

by  substituting  the  means  gl,£  =  1,...,L  for  the  random  parameters  g(,£  = 
1, . . . ,  L,  and  the  variance  var(zj^)  by  using  the  dual  solution: 

^(zm)  =  X^var^)  =  (41) 

*=i  *=i 

As  the  lower  bound  means  increase  monotonically  with  the  number  of  iterations, 
we  obtain  the  largest  lower  bound  by  LBl  =  z*^  and  LBl  :=  N(LBL ,  var(LBL)). 


4.3  Stopping  Rule 

The  analogy  to  the  deterministic  Benders  decomposition  algorithm  we  stop,  if 
the  upper  and  lower  bound  are  sufficiently  close.  In  the  case  of  probabilistic  bounds, 
the  algorithm  has  to  be  stopped,  if  the  upper  and  lower  bound  are  indistinguishable 
in  distribution.  Based  on  the  idea  of  George  B.  Dantzig,  we  check  this  condition  by 
using  students-t-test  to  determine  if  s'  >  0  with  95%  probability,  where 

s*  =  UBl  -  LB 1  +  TOL  (42) 
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and  TOL  being  a  given  tolerance. 

The  employment  of  students-t-test  requires  independency  of  the  upper  and 
lower  bound  distributions.  As  independency  is  not  ensured  in  the  first  place  as 
an  upper  bound  and  a  binding  cut  in  the  master  problem  could  be  obtained  from 
the  same  set  of  samples,  we  obtain  independency  by  resampling  the  lowest  upper 
bound  before  employing  students-t-test.  The  X  corresponding  to  the  lowest  upper 
bound  and  the  corresponding  importance  distribution  have  to  be  stored.  If  upper 
and  lower  bounds  are  close  to  each  other,  which  is  checked  by  using  students-t-test 
without  fulfilling  the  independence  requirement,  we  use  new  samples  to  compute 
an  independent  upper  bound.  Now  we  check  if  sl  >  0  by  students-t-test. 

4.4  Confidence  Interval 

After  passing  the  students-t-test  in  the  last  iteration,  which  means  that  the 
upper  and  lower  bound  means  are  indistinguishable,  we  obtain  the  optimum  solution 
XL ,6  from  the  master  problem.  We  derive  from  the  distributions  LB1  and  U Bl 
a  95%  confidence  interval:  On  the  left  side  by  using  the  lower  bound  distribution 
and  on  the  right  side  by  using  the  upper  bound  distribution.  We  define 

Cleft  -  196^var(LBL) 

Cright  =  L96\/var(UBL) 
and  obtain  the  confidence  interval 

LB  ~  Cleft  <  Z*  <UB  +  Cright 

for  the  final  solution  Z*. 

(Cieft  +  CrighO/LBL  <  Ctoi,  where  Ctoi  is  a  predefined  quality  criteria  for 
the  confidence  interval,  the  obtained  solution  is  satisfying.  Otherwise  the  sample 
size  has  to  be  increased  and  the  problem  has  to  be  solved  again  with  the  increased 
sample  size. 


(43) 


(44) 
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4.5  Improvement  of  the  Solution 

Suppose  the  solution  with  a  certain  sample  size  was  not  satisfying.  Instead 
of  starting  from  the  beginning  with  an  increased  sample  size  we  want  to  use  the 
information,  which  we  have  already  collected.  To  do  this,  we  look  for  the  binding 
cuts  in  the  final  solution,  increase  the  sample  size  and  recompute  the  binding  cuts 
at  the  same  X e,  they  were  originally  computed.  This  of  course  means  that  one  has 
to  store  the  values  of  X1  and  the  associated  importance  distributions  or  recompute 
the  latter.  The  enlarged  sample  size  leads  to  smaller  variances  of  the  binding  cuts 
and  eventually  to  a  smaller  confidence  interval  of  the  final  solution.  Berry-Esseen, 
e.g.  Hall  (1979),  give  upper  bounds  on  the  rates  of  convergence  in  the  central  limit 
theorem.  Solving  the  master  problem  again  with  the  improved  binding  cuts  will 
in  general  not  result  in  an  intersection  of  the  lower  and  upper  bound.  Therefore 
some  more  iterations  are  necessary  to  obtain  the  optimal  solution  according  to  the 
increased  sample  size.  This  improvement  procedure  could  be  employed  iteratively 
until  a  satisfying  solution  is  obtained.  It  is  a  possible  way  to  improve  a  non  satisfying 
solution.  It  may  not  be  very  efficient  and  there  may  be  better  ways  to  do  so. 
In  general  we  choose  a  sample  size  such  that  the  obtained  confidence  interval  is 
satisfying.  We  can  state  now  the  algorithm  as  follows: 

4.6  The  Algorithm 
Step  0  Initialize: 

e  =  o,uB°  =  oo. 

Step  1  Solve  the  relaxed  master  problem  and  obtain  a  lower  bound: 

Lb 1  =  cx  +  el. 

Step  2  £  =  £  +  1 

Solve  subproblems  and  obtain  an  upper  bound: 

UBe  =  min {UB*-1  ,cXl  +  z( X *)},  compute  and  add  a  cut  to  the  master 
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problem,  using  Monte  Carlo  (importance)  sampling. 

Step  3  Solve  the  master  problem  and  obtain  a  lower  bound: 

lb 1  =  cxl  +  ee. 

Step  4  s  =  UBe  -  LB1  +  TOL 

Check  if  s  >  0  using  students-t-test. 

Step  5  Compute  confidence  interval  and  obtain  a  solution:  Z*,X,d.  Stop. 

Improvement  of  the  solution: 

Step  6  If  (C|e£t  +  C^ght)/ LB  <  Ctoi,  stop, 
otherwise  got  to  Step  7 

Step  7  Increase  sample  size  and  initialize  UB°  =  oo. 

Step  8  Recompute  binding  cuts. 

Upper  bound:  UBl  —  min{17 Be~1 ,  CX  +  z(Xt). 

Step  9  Go  to  step  3 

5.  Numerical  Results 

The  method  has  been  implemented.  The  Fortran  code  for  solving  general  large- 
scale  two-stage  stochastic  linear  problems  with  recourse  using  Benders  decomposi¬ 
tion  and  importance  sampling  uses  MINOS  (Murtagh  and  Saunders  (1983)),  which 
has  been  adapted  for  this  purpose,  as  a  subroutine  for  solving  the  linear  programs 
of  the  master-problem  and  the  sub-problems.  Alternativly  the  code  can  also  use  a 
modified  version  of  Tomlin’s  (1973)  LPM1  code  of  the  revised  simplex  method  as 
a  subroutine.  Versions  of  the  code  are  installed  on  several  computers,  like  on  the 
IBM-3090,  the  Microvax-workstation,  and  on  Personal  Computers.  All  following 
test  results  are  computed  on  a  Toshiba  laptop  personal  computer  T5200.  First  we 
present  an  illustrative  example,  a  toy  problem  of  expansion  planning  of  power  sys¬ 
tems  which  we  discuss  in  detail.  Then  we  derive  numerical  results  from  other  small 
test  problems.  Eventually  we  demonstrate  the  solution  of  large-scale  test  problems 
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with  numerous  stochastic  parameters. 

The  illustrative  example,  test  problem  APL1P  is  a  model  of  a  simple  power 
network  with  one  demand  region.  There  are  two  generators  with  different  invest¬ 
ment  and  operating  costs,  and  the  demand  is  given  by  a  load  duration  curve  with 
three  load  levels:  base,  medium  and  peak.  We  index  the  generators  with  j  =  1,2, 
and  the  demands  with  i  =  1,2,3.  The  variables,  x^  j  =  1,2,  denote  the  capacities 
which  can  be  built  and  operated  to  meet  demands  di,  i  =  1,2,3.  The  variable  yij 
denotes  the  operating  level  for  generator  j  in  load  level  i  with  operating  cost  fl}. 
The  variable  t/js  defines  the  unserved  demand  in  load  level  i  which  can  be  purchased 
with  penalty  cost  f{s  >  fij.  The  subscript  s  is  not  an  index,  but  denotes  only  an 
unserved  demand  variable.  The  per-unit  cost  to  build  generator  j  is  Cj.  Finally,  the 
model  is  formulated  with  complete-recourse,  which  means  that  at  any  given  choice 
of  x  demand  is  satisfied  for  all  outcomes.  In  this  model,  building  new  generators 
competes  with  purchasing  unserved  demand  through  the  cost  function,  yet  there  is 
a  minimum  capacity  bi  which  has  to  be  built  for  each  load  level.  The  availabilities  of 
the  two  generators,  /?j,  j  =  1,2,  and  the  demands  in  each  load  level,  d{,  i  =  1,2,3, 
are  uncertain.  Generator  one  has  four  possibilities,  while  generator  two  has  five, 
and  each  demand  has  four.  All  of  the  data  values  are  given  in  Table  1  and  the 
problem  can  be  formulated  as  follows: 


min]Cj= !  CjXj  +  £{Zj=iE?=i  fxjV'tj  +  £i=i  fisVi’,} 

s/t  x}  >b}  j-  1,2 

+  T.Uvij  <0,  .7  =  1,2 

£>=i  Vij  +  y?s  >d?,i  =  1,2,3 

xj,  y?j,  y?*  j  =  !>2, 

i  =  1,2,3. 
(45) 


We  will  take  cj  €  ft  when  solving  the  universe  problem  and  cj  E  S  when  solving 
a  problem  with  sampling. 
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Generator  Capacity  Costs  (106$/(MW,  a 

0) 

Cl 

=  0.4,  c2  =  0.25 

Generator  Operating  Costs  (106$/MW,  a) 

hi 

=  0.43  /21  = 

0.87 

fl2 

=  0.20  /22  = 

0.40 

/l3 

=  0.05  /23  = 

0.10 

Unserved  Demand  Penalties  (106$/MW, 

a) 

fla 

II 

< 

II 

< 

II 

1.0 

Minimum  Generator  Capacities  (MW) 

&i  =  &2  =  1000 

Demands  (MW) 
# 

1  2 

3 

4 

Outcome 

900  1000 

1100 

1200 

Probability 

0.15  0.45 

0.25 

0.15 

Availabilities  of  Generators 

Generator  1  (0i) 
# 

1  2 

3 

4 

Outcome 

1.0  0.9 

0.5 

0.1 

Probability 
Generator  2  (/?2) 

0.2  0.3 

0.4 

0.1 

# 

1  2 

3 

4 

5 

Outcome 

1.0  0.9 

0.7 

0.1 

0.0 

Probability 

0.1  0.2 

0.5 

0.1 

0.1 

Table  1:  APL1P,  test  problem  data. 
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The  number  of  possible  demands  and  availabilities  results  in  4  *  5  *  43  =  1280 
possible  outcomes  in  fi,  and  thus  1280  subproblems  have  to  be  solved  in  each  it¬ 
eration  of  Benders  decomposition  for  the  universe  case.  We  compare  the  universe 
solution  with  solutions  gained  by  the  importance  sampling  algorithm.  Table  2  shows 
the  results  in  the  case  of  20  samples  out  of  the  possible  1280  combinations  and  with¬ 
out  an  improvement  phase.  100  replications  of  the  same  experiment  with  different 
seeds  were  run  to  get  statistical  information  about  the  accuracy  of  the  solution 
and  the  estimated  confidence  interval.  The  mean  over  the  100  replications  of  the 
objective  function  value  (total  costs)  differs  from  the  universe  solution  by  0.3%. 
From  the  distribution  of  the  optimum  objective  function  value  derived  from  the  100 
replications  of  the  experiment  a  95%  confidence  interval  is  computed:  plus  minus 
2.1%.  In  each  replication  a  95%  confidence  interval  of  the  solution  is  estimated. 
The  mean  over  all  replications  of  the  estimated  confidence  interval  is  on  the  left 
side  1.5%  and  on  the  right  side  1.9%.  In  the  worst  case  an  objective  function  value 
of  26233.9  was  computed.  This  is  about  6.4%  off  the  correct  answer.  The  estimated 
95%  confidence  interval  in  this  case  did  not  cover  the  correct  answer.  The  coverage 
rate  of  90%  expresses  that  in  90%  of  the  100  replications  the  correct  answer  of  the 
universe  solution  is  covered  by  the  estimated  confidence  interval.  This  shows  that  if 
using  a  sample  size  of  20  we  are  slightly  underestimating  the  confidence  interval:  if 
the  computation  of  the  95%  confidence  interval  was  exact,  we  would  expect  a  cover¬ 
age  rate  of  95%.  The  reason  for  the  underestimation  of  the  95%  confidence  interval 
in  the  case  of  sample  size  20  lies  in  the  underlying  assumptions  of  the  estimation 
method,  e.g.  constant  error  distribution  along  a  cut,  same  basis  for  all  outcomes  of 
the  random  right  hand  sides  of  the  cuts.  Especially  the  latter  assumption  is  only 
true,  if  the  variances  are  small.  A  larger  sample  size  reduces  the  variances  and 
we  expect  a  better  coverage  rate  of  the  95%  confidence  interval.  The  bias  and  the 
confidence  interval  of  the  optimum  strategies  (the  loads  x  to  be  installed)  are  larger 
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than  those  of  the  optimum  objective  function  value.  The  objective  function  near 
the  optimal  solution  appears  to  be  flat:  several  different  strategies  lead  close  to  the 
optimum  costs.  Confidence  intervals  of  about  57%  and  52%  are  computed.  In  the 
above  example  a  sample  size  of  20  was  chosen.  Note  that  additional  computational 
effort  is  also  needed  to  obtain  the  importance  distribution,  e.g.  17  subproblems 
have  to  be  solved  in  each  iteration  to  obtain  the  marginal  costs  Compared  to 
the  universe  solution  the  method  e.g.  achieves  with  about  2.9%  of  computational 
effort  a  solution  which  is  with  95%  confidence  within  an  interval  of  plus  minus  2.1% 
of  the  correct  answer.  Importance  sampling  seems  to  be  a  promising  approach  to 
solving  stochastic  linear  programs.  Table  3  represents  the  results  when  using  200 
samples:  One  can  see  decreasing  bias,  decreasing  — -'fidence  intervals  and  improving 
estimations  of  the  confidence  interval''  with  increased  sample  size.  The  coverage  of 
the  95%  confidence  interval,  computed  by  100  replications  of  the  experiment  with 
different  seeds,  is  now  95%. 

We  investigate  the  performance  of  the  algorithm  on  two  other  examples  which 
are  small  enough  to  compute  the  universe  solution.  PGP2,  derived  from  Louvaux 
(1988),  is  a  power  generation  planning  model  used  to  determine  the  capacities  of 
various  types  of  equipment  required  to  ensure  that  consumer  demand  is  met.  The 
demands  in  3  demand  regions  are  stochastic  and  represented  by  discrete  random 
variables  with  9,  9  and  8  outcomes.  CEPl  is  a  capacity  planning  model  for  a 
manufacturing  plant  in  which  several  parts  are  produced  on  several  machines.  If 
the  demand  for  the  parts  exceeds  the  production  capability  the  residual  parts  are 
purchased  from  external  sources  at  a  price  much  higher  than  the  production  costs 
to  meet  the  demand.  There  are  3  stochastic  parameters  (demands  for  parts),  with 
discrete  and  uniform  distribution  with  10  outcomes  each.  The  formulations  and 
data  for  CEPl  and  PGP2  may  be  found  in  Higle,  Sen  and  Yakowitz  (1990). 

In  the  case  of  PGP2  we  obtained  very  accurate  results  using  a  sample  size  of 


25 


50.  By  computing  100  replications  of  the  experiment  the  mean  of  the  objective 
function  values  differs  0.1%  from  the  correct  answer.  The  95%  confidence  interval 
of  the  objective  function  value,  computed  by  the  100  replications  of  the  experiment 
is  ±0.76%,  the  mean  of  the  confidence  intervals  estimated  in  each  replication  is  on 
the  left  side  0.62%  and  on  the  right  side  0.9%.  In  98%  the  correct  solution  is  covered 
by  the  95%  confidence  interval.  In  the  worst  case  the  solution  differed  by  0.77% 
from  the  correct  answer  and  was  not  covered  by  the  95%  confidence  interval. 

In  the  case  of  CEPl  a  higher  sample  size  is  needed  to  obtain  accurate  results. 
The  estimation  of  the  second  stage  costs  appears  to  be  more  difficult.  The  reason  lies 
in  the  fact  that  the  (penalty)  costs  of  buying  parts  from  external  sources  are  much 
higher  than  the  costs  of  production.  For  this  problem  the  additive  approximation 
function  is  not  a  very  good  approximation  to  the  true  cost  function  as  it  does  not 
cover  the  very  high  costs  in  scenarios  where  all  3  demands  are  high.  The  estimated 
confidence  interval  seems  to  be  large,  we  computed  4.65%  on  the  left  side  and  4.62% 
on  the  right  side  (mean  over  100  replications  of  the  experiment).  The  estimations 
of  the  confidence  interval  are  accurate  as  indicated  by  the  coverage  rate  of  95%  of 
the  correct  answer  by  the  95%  confidence  interval.  In  the  worst  case  a  difference  of 
8.07%  of  the  objective  function  value  to  the  correct  answer  is  computed.  The  worst 
case  solution  is  not  covered  by  the  estimated  confidence  interval.  In  this  examples 
it  is  easier  to  compute  the  value  of  the  first  stage  variables  than  to  estimate  the 
second  stage  costs.  In  most  cases  the  correct  answer  of  the  first  stage  variables  was 
obtained.  We  have  developed  methods  which  adaptively  improve  the  approximation 
function  if  sample  information  shows  that  the  variance  of  the  estimation  is  too  large. 
The  discussion  of  the  adaptive  approach  is  not  subject  of  this  paper.  Table  4  and 
Table  5  represent  the  computational  results  of  PGP2  and  CEPl  and  show  the  sizes 
of  the  test  problems. 

In  the  following  we  report  on  the  solution  of  some  large  test  problems  with 
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several  stochastic  parameters,  which  are  too  big  to  be  solved  by  computing  the 
universe  solution. 

WRPM  is  a  prototype  multi-area  capacity  expansion  planning  problem  for  the 
western  USA  and  Canada.  The  model  is  detailed  covering  6  regions,  3  demand 
blocks,  2  seasons,  and  several  kinds  of  generation  and  transmission  technologies. 
The  objective  is  to  determine  optimum  discounted  least  cost  levels  of  generation 
and  transmission  facilities  for  each  region  of  the  system  over  time.  The  model  min¬ 
imizes  the  total  discounted  costs  of  supplying  electricity  (investment  and  operating 
costs)  to  meet  the  exogenously  given  demand  subject  to  expansion  and  operating 
constraints.  A  description  of  the  model  can  be  found  in  Dantzig  et  al.  (1989) 
and  Avriel,  Dantzig  and  Glynn  (1989).  In  the  stochastic  version  of  the  model 
the  availabilities  of  generators  and  transmission  lines  and  demands  are  subject  to 
uncertainty.  There  are  13  stochastic  parameters  per  time  period  (8  stochastic  avail¬ 
abilities  of  generators  and  transmission  lines  and  5  uncertain  demands)  with  discrete 
distributions  with  3  or  4  outcomes.  The  operating  sub-problems  in  each  period  are 
stochastically  independent.  The  test  problem  WRPM1  covers  a  time  horizon  of  1 
future  period,  WRPM2  covers  2  future  periods.  There  are  differences  in  the  pa¬ 
rameters  between  WRPMl  and  WRPM2.  Note  that  in  the  deterministic  equivalent 
formulation  the  problem  would  have  more  than  1.5  billion  (WRPMl)  and  more 
than  3  billion  (WRPM2)  equations. 

FI2  is  a  portfolio  management  test  problem,  formulated  as  a  network  problem. 
It  is  a  modified  version  of  test  problems  found  in  Mulvey  and  Vladimirovi  (1989). 
The  problem  is  to  select  a  portfolio  which  maximizes  expected  returns  in  future 
periods  taking  into  account  the  possibility  of  revising  the  portfolio  in  each  period. 
There  axe  also  transaction  costs  and  bounds  on  the  holdings  and  turnovers.  The 
test  problem  FI2  covers  a  planning  horizon  of  two  future  periods.  The  returns  of 
the  stocks  in  the  two  future  periods  are  stochastic  parameters.  The  problem  is 
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formulated  as  a  2-stage  problem.  Rather  than  solving  the  problem  by  looking  at  a 
certain  number  of  preselected  scenarios  (18  to  72  in  case  of  Mulvey  and  Vladimirou) 
we  instead  assumed  the  returns  of  the  stocks  in  the  future  periods  being  independent 
random  parameters,  discretely  distributed  with  3  outcomes  each.  As  there  are 
13  stocks  with  uncertain  returns,  the  problem  has  26  stochastic  parameters.  The 
universe  number  of  scenarios  (2.5  ■  1012)  is  very  large,  so  that  the  deterministic 
equivalent  formulation  of  the  problem  has  more  than  1014  rows.  The  stochastic 
parameters  appear  in  the  B-matrix  as  well  as  in  the  D-matrix. 

Computational  results  of  the  large-scale  test  problems  are  represented  in  Table 
6.  Besides  the  solution  of  the  stochastic  problems  Table  6  also  shows  the  results 
from  solving  the  expected  value  problem.  In  this  case  the  stochastic  parameters  are- 
substituted  by  their  expectations  to  obtain  a  deterministic  problem.  The  expected 
value  solution  is  then  used  as  a  starting  point  for  the  stochastic  solution.  We  also 
report  on  the  estimated  expected  costs  of  the  expected  value  solution.  These  are  the 
total  expected  cost  which  would  occur  if  the  expected  value  solution  is  implemented 
in  a  stochastic  environment.  The  objective  function  value  of  the  true  stochastic  so¬ 
lution  has  to  lie  between  the  objective  function  value  of  the  expected  value  solution 
and  the  expected  costs  of  the  expected  value  solution. 

In  case  of  WRPM1  and  WRPM2  we  chose  a  sample  size  of  100.  The  estimate  of 
the  objective  function  value  of  the  stochastic  solution  (289644.2  in  case  of  WRPM1 
and  143109.2  in  case  of  WRPM3)  turns  out  to  be  amazingly  accurate.  The  95%  con¬ 
fidence  interval  is  computed  as  0.0913%  on  the  left  side  and  0.063%  on  the  right  side 
(WRPM1)  and  0.0962%  on  the  left  side  and  0.1212%  on  the  right  side  (WRPM2). 
Thus  the  objective  function  value  of  the  stochastic  solution  lies  with  95%  proba¬ 
bility  within  289379.7  <  2*  <  289826.0  (WRPM1)  and  142971.5  <  2*  <  143282.6 
(WRPM2).  In  both  cases  the  expected  costs  of  the  expected  value  solution  and 
the  expected  costs  of  the  stochastic  solution  differ  significantly.  The  solution  time 
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on  a  Toshiba  T5200  laptop  PC  with  80387  mathematic  coprocessor  was  75  minutes 
(WRPM1)  and  187  minutes  (WRPM2).  During  this  time  about  7500  (WRPMl) 
and  15700  (WRPM2)  subproblems  (linear  programs  of  the  size  of  302  rows  and  289 
columns)  get  solved. 

A  sample  size  of  200  has  been  chosen  for  solving  test  problem  FI2.  The  problem 
gets  solved  in  only  4  iterations.  The  objective  function  value  of  the  stochastic 
solution  is  computed  as  1.1695  with  a  95%  confidence  interval  of  0.454%  on  the  left 
side  0.371%  on  the  right  side.  Thus  with  95%  probability  the  optimal  solution  lies 
between  1.164  <  z*  <  1.174.  The  estimated  expected  costs  of  the  expected  value 
solution  (1.172)  lie  within  the  95%  confidence  interval  of  the  costs  of  the  stochastic 
solution,  however  also  in  this  case  expected  costs  of  the  expected  value  solution  and 
expected  costs  of  the  stochastic  solution  differ  significantly. 

6.  Conclusion 

We  have  discussed  a  promising  approach  to  solving  two  stage  stochastic  linear 
programs  with  recourse  and  obtained  first  numerical  results:  employing  importance 
sampling  within  the  Benders  decomposition  algorithm  we  got  very  accurate  solu¬ 
tions  to  the  test  problems  with  only  a  small  sample  size.  The  technique  enables 
us  to  solve  large-scale  problems  with  a  large  number  of  stochastic  parameters  on  a 
laptop  computer.  The  test  problems  solved  so  far  include  up  to  26  stochastic  pa¬ 
rameters  and  the  sub-problems  have  a  size  of  several  hundred  rows  and  columns.  In 
the  deterministic  equivalent  formulation  the  problems  would  have  more  than  several 
billions  of  equations.  The  small  confidence  intervals  of  the  solutions  indicate  that 
an  extension  to  even  more  stochastic  parameters  is  possible.  The  analysis  in  this 
paper  concentrated  on  discrete  distributions.  The  method,  however,  can  be  easily 
extended  to  continuous  distributions.  Current  research  concentrates  on  testing  the 
technique  on  large-scale  problems  of  different  areas  with  large  numbers  of  stochastic 
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parameters.  We  investigate  possibilities  to  adaptively  improve  the  approximation 
function,  if  it  turns  out  that  the  error  of  the  estimate  exceeds  a  predefined  level. 
If  some  problems  require  a  much  higher  sample  size  the  use  parallel  processors  will 
enable  us  to  quickly  solve  large  numbers  of  samples  to  obtain  low  variances  of  the 
estimations.  A  parallel  implementation  of  the  method  is  in  preparation. 
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Table  2:  Model  APL1P,  20  samples  (100  replications  of  the  experiment) 


correct 

mean 

95%  conf 
% 

bias 

% 

#univ 

1280 

#iter 

7.6 

G1 

1800.0 

1666/5 

57.0 

-  7.4 

G2 

1571.4 

1732.5 

52.5 

theta 

13513.7 

13729.4 

21.3 

1.6 

obj 

24642.3 

24726.7 

2.1 

0.3 

est.  conf  (%) 

left 

1.5 

est.  conf  (%) 

right 

1.9 

coverage 

0.90 

Table  3:  Model  APL1P,  200  samples  (100  replications  of  the  experiment) 


correct 

mean 

95%  conf 
% 

bias 

% 

#univ 

1280 

#iter 

7.9 

Gl 

1800.0 

1728.7 

31.5 

-  4.0 

G2 

1571.4 

1681.7 

29.2 

7.0 

theta 

13513.7 

13554.7 

12.2 

0.3 

obj 

24642.3 

24673.8 

0.4 

0.1 

est.  conf  (%) 

left 

0.4 

est.  conf  (%) 

right 

0.7 

coverage 

0.95 
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Table  4:  Model  PGP2,  50  samples  (100  replications  of  the  experiment) 


#univ 

#iter 

correct 

648 

mean 

9.1 

95%  conf 
% 

bias 

% 

obj 

392.2 

392.5 

0.76 

0.1 

est.  conf  (%) 

left 

0.62 

est.  conf  (%) 

right 

0.9 

coverage 

0.98 

comp,  time  (min) 

0.28 

Problem  Size 

Master: 

rows 

3 

columns 

7 

nonzeros 

16 

Sub: 

rows 

8 

columns 

16 

nonzeros 

52 

Table  5:  Model  CEP1,  200  samples  (100  replications  of  the  experiment) 


correct 

mean 

95%  conf 
% 

bias 

% 

#univ 

1000 

#iter 

6.4 

obj 

57790.7 

58832.7 

4.63 

1.8 

est.  conf  (%) 

left 

4.65 

est.  conf  (%) 

right 

4.62 

coverage 

0.95 

comp,  time  (min) 

0.28 

Problem  Size: 

Marks: 

rows 

12 

columns 

10 

nonzeros 

36 

Sub: 

rows 

9 

columns 

16 

nonzeros 

53 
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Table  6:  Large  test  problems:  computational  results 


WRPM1 

WRPM2 

FI2 

#  iter  stoch.  (exp.  val.) 

139  (82) 

131  (83) 

4(2) 

sample  size 

100 

100 

200 

exp.  val.  solution  obj 

286323.2 

140041.0 

1.0766 

exp.  val.  solution,  exp.  cost 

295473.7 

147227.3 

1.172 

stochastic  solution 

289644.2 

143109.2 

1.169 

est.  conf.  left  % 

0.0913 

0.0962 

0.454 

corr  ■  Aht  % 

0.063 

0.1212 

0.371 

<-o1i.:ion  time  (min) 

75 

187 

2 

Problem  Size 

Master  rows 

44 

86 

48 

columns 

76 

151 

33 

nonzeros 

153 

334 

130 

Sub  rows 

302 

302 

61 

columns 

289 

289 

45 

nonzeros 

866 

866 

194 

#  univ.  scenarios 

5038848 

10077696 

2.5  •  1012 
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